Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc] More memory allocation changes for 8086 toolchain #2140

Merged
merged 1 commit into from
Dec 17, 2024
Merged

Conversation

ghaerr
Copy link
Owner

@ghaerr ghaerr commented Dec 17, 2024

Last of bug fixes and enhancements to our new arena-based malloc for the 8086 toolchain. These enhancements currently only work for OpenWatcom C and large/compact model (i.e. 32-bit data pointers).

@rafael2k, The whole arena malloc is a bit complicated to explain, but basically a single wrapper file, mem.c (see below) will be used in each tool to fully encapsulate malloc, realloc and free. There will/can be no ifdefs or renames of malloc to memalloc in each tool, that will all have to be removed or replaced back to what it originally was. (Easily done by just deleting the ifdef ELKS portions entirely). The reason for this is that the C library routines themselves call malloc, and we need replace all memory allocation calls, as a C library returned-from-malloc pointer could be passed around inside a tool and then passed to our renamed malloc, which would cause a problem. With the full wrapper, even the C library routines that call malloc end up in our allocator. There are lots of other linker issues with the multiple memory allocators we now have available, but I think I've got it all straightened out to normally work and link automatically.

So, long story short, the following file mem.c will be added to each tool's project Makefile. Hopefully we will not need customization of the arena vs fmemalloc threshold (MALLOC_ARENA_THRESH) since the default small heap is 64K, but this can now be done programmatically using an extern int malloc_arena_thresh instead of changing mem.c source code. For now, the arena allocator allocates a maximum 65520 bytes from main memory, and then subdivides that for any allocations <= 1K bytes each, the rest going to fmemalloc. The other good news is that by setting the following in a shell script before running a tool, a full heap analysis showing every allocation will be dumped:

# sysctl malloc.debug=3   (turn on debug malloc output to level 3 max)
# sysctl kern.debug=1     (turn on debug kernel sbrk/fmemalloc display)

Since it's a bit complicated at the moment, I'll try to take a first pass by pulling down your repo (I've saved my compiler bug fixes for later) and see if I can get all the tools running with the new allocator. I'll then post a PR for your review and we can go from there with regards to tuning.

The following 'mem.c' file wrapper must be included in each tool, as this can't go in the C library directly. The net effect is to force malloc, free and realloc from the tool or the C library to use the arena allocator and fmemalloc, instead of the default malloc/free/realloc.

/* malloc/free wholesale replacement for 8086 toolchain */
#include <stdlib.h>

#define MALLOC_ARENA_SIZE   65520U  /* size of initial arena fmemalloc (max 65520)*/
#define MALLOC_ARENA_THRESH 1024U   /* max size to allocate from arena-managed heap */

unsigned int malloc_arena_size = MALLOC_ARENA_SIZE;
unsigned int malloc_arena_thresh = MALLOC_ARENA_THRESH;

#define FP_SEG(fp)          ((unsigned)((unsigned long)(void __far *)(fp) >> 16))
#define FP_OFF(fp)          ((unsigned)(unsigned long)(void __far *)(fp))

static void __far *heap;

void *malloc(size_t size)
{
    char *p;

    if (heap == NULL) {
        heap = fmemalloc(malloc_arena_size);
        __amalloc_add_heap(heap, malloc_arena_size);
    }

    if (size <= malloc_arena_thresh)
        p = __amalloc(size);
    else p = fmemalloc(size);
    return p;
}

void free(void *ptr)
{
    if (ptr == NULL)
        return;
    if (FP_OFF(ptr) == 0)       /* non-arena pointer */
        fmemfree(ptr);
    else
        __afree(ptr);
}

void *realloc(void *ptr, size_t size)
{
    void *new;
    size_t osize = size;

    if (ptr == 0)
        return malloc(size);

#if LATER
    /* we can't yet get size from fmemalloc'd block */
    osize = malloc_usable_size(ptr);
    if (size <= osize)
        osize = size;           /* copy less bytes in memcpy below */
#endif

    new = malloc(size);
    if (new == 0)
        return 0;
    memcpy(new, ptr, osize);    /* FIXME copies too much but can't get real osize */
    free(ptr);
    return new;
}

If you have more questions, please ask, thanks!

@ghaerr ghaerr merged commit 880b914 into master Dec 17, 2024
2 checks passed
@ghaerr ghaerr deleted the malloc2 branch December 17, 2024 05:24
@rafael2k
Copy link
Contributor

This is massive! Cool.
I can test here, and check whether we still can run all the tools.

@toncho11
Copy link
Contributor

So this will allow ELKS native apps to use more than 64kb?

@rafael2k
Copy link
Contributor

So this will allow ELKS native apps to use more than 64kb?

When using OW toolchain (which allows large memory model), yes. With other toolchains @ghaerr will know.

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 17, 2024

So this will allow ELKS native apps to use more than 64kb?

All applications built with OWC in large model have the ability to use more than 64k code or data.

With an 8086, up to 64k of code can be accessed with the CS register, and likewise an additional 64k of data can be accessed with the DS register. In small model, the CS and DS registers are set once at program startup and thus the program size is limited to 64k code and 64k data. In large model, a data pointer is 32 bit and holds a separate DS value in the top 16 bits along with the lower 16 bits pointing to 64k of data within that "segment". Thus, it is possible that a far pointer can actually use one of 2^16 = 65536 segments, each pointing to 64k of data (Yes, these segments would overlap, that's another discussion).

The issue this PR fixes has to do with the notion of the "default data segment". That is, even with large model programs where the DS register can be set to point to anything, there is only one "default data segment" where the program stack, statically declared data and string literal variables reside. It fills up quickly with big programs. In addition, the normal "heap" is contained in this same default data segment, and uses the remaining space left up to 64k AFTER all the stack, data and literals are added. That can be quite small with large programs, and that's the case we have with the large toolchain programs.

The normal storage allocator, malloc(), allocates data only from the default data segment, that's the big problem. One can use fmemalloc to allocate from anywhere in memory, but it costs 16 bytes per allocation in the kernel near data segment, which is also subject to the same 64k limitations.

In the 8086 toolchain, the default data segment was pretty filled up. What the arena allocator does is allow the default memory allocator (malloc) to allocate from outside the default data segment - out of a separate, new data segment which is 64k to start. So there's lots more slots available to fill the hundreds or possibly thousands of allocations the toolchain requires.

This particular first implementation only allows a single separate 64k "arena". We will see if that is enough. I have plans to expand it to allow an unlimited number of additional 64k "heap arenas" from which to allocate memory from if this is not enough. The overhead for an allocation in the arena allocator is only two bytes, versus 16 for fmemalloc.

@rafael2k
Copy link
Contributor

@ghaerr, should I PR the memory changes in the toolchain using this new memory allocation strategy, or wait you to PR first so there is no merge conflicts?

@ghaerr
Copy link
Owner Author

ghaerr commented Dec 17, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants